An alternative pattern for objects in Python

Python is mostly a very nice language that makes for smooth and readable code. An exception is the common way of coding classes. First, it clutters methods with self references. Second, if you choose to use the double underscore convention for indicating a “private” attribute, each reference to it is preceded by self.__. This can destroy much of the readability. If you doubt me, take a look at ./classic/VertexPQ.py for a particularly self.__ infested example.

Here's a prototype example of a Python class (which doesn't look too bad, because it's small and only has one attribute):

class T:
    def __init__(self, x):
        self.__x = x

    def print_x(self):
        print(self.__x)

    def times_x(self, multiplier):
        return multiplier * self.__x

    def change_x(self, new_x):
        self.__x = new_x

Some people seem to think referencing through self is a good thing, and do the same in, e.g., Java, when they don't even have to, writing this.x rather than plain x to reference an attribute. In my view, that's clumsy, and stops the flow of reading. When you want x you should just have to think x, not “the attribute called x in the object for which the current code was invoked”. If it's important to remember that x is an object attribute, you should instead give it a name that makes that obvious in a natural way.

The double underscore naming convention is a way of getting around that Python doesn't have a simple way of limiting access to attributes. Referencing double-underscore names at least makes access to them look conspicuous (even more so since Python actually name-mangles even more garbage onto them, which you have to mimic when referencing them from outside). Fine, but the problem is references in class methods also look conspicuous (and get difficult to read), which of course they aren't.

I'm not alone in not being crazy about double underscores. But just getting rid of them is no solution, because you want some way of keeping outside code from leisurely referencing “private” attributes. (Sure, you can use one underscore instead of two – which actually appears to be what the Python tutorial suggests, except when you want to protect agains name clashes that might result from inheritance – but that just halves the problem at best, it doesn't remove it.) Students are often taught that this naming is the proper way to do privates in Python.

The solution to both of our problems

Another language that lacks a language construct to hide private attributes from the outside is Javascript. The way I've learned to get around that, from Javascript: The Good Parts by Douglas Crockford, is to skip classes (which were badly designed in Javascript anyway) and use the closure of the constructor function as storage for the object.

In Python, I suggest keeping classes for objects, but still use the closure of the constructor pattern. Here's a first attempt, which uses the pattern completely manually, at a replacement for the example above.

class T:
    def __init__(self, x):
        def print_x():
            print(x)

        def times_x(multiplier):
            return multiplier * x

        def change_x(new_x):
            nonlocal x
            x = new_x

        self.print_x = print_x
        self.times_x = times_x
        self.change_x = change_x

Most of the clutter is gone, and as a bonus, x is now truly private: there is no way it can be accessed by name from the outside, except through the methods intended for it. (You can still hack into it if you know it's position in the __closure__ of the methods.)

However, the three assignments at the end that “export” the methods are annoying. We get rid of them by the use of a Python decorator.

from clo_decorators import clo_method

class T:
    def __init__(self, x):
        @clo_method(self)
        def print_x():
            print(x)

        @clo_method(self)
        def times_x(multiplier):
            return multiplier * x

        @clo_method(self)
        def change_x(new_x):
            nonlocal x
            x = new_x

You find clo_method in my ./closure/clo_decorators.py. That also contains a clo_property that can be used to let users of the class reference properties as if they were regular object attributes, but access to them go through specified accessor methods. It's likely that I didn't code this in the best possible way, but it works.

When T is used the way it's supposed to, object closure implementation variant behaves exactly the classic self.__ implementation. There may be performance differences, however, which we get to below.

A larger example and the performance aspect

I teach a basic algorithms class based on the book Algorithms, 4th Edition by Sedgewick and Wayne, which has most of its examples in Java. Sometimes I get students who are not used to writing in Java, and I allow them to use other languages for their assignments. To help the students that code in Python with a graph algorithm assignment, I ported a few of the graph classes from the book to Python. To demonstrate the closure object pattern, I've done this both with the classic Python object pattern and with the closure object pattern using my clo_decorators. You can find the code here:

In particular, have a glance at the closure binary heap implementation ./closure/VertexPQ.py compared to the classic implementation ./classic/VertexPQ.py to get an idea of how much cleaner a relatively complex data structure gets with the closure pattern.

The example programs ./demo/test_bfs_random.py and ./demo/test_dijkstra_random.py each take two integer parameters to generate random graphs that they run the algorithms on: the first is the number of vertices, the second is roughly the out-degree of each vertex. To stress test the performance of the implementations, you can try them with large graphs. Here are some sample runs of the Dijkstra demo (the one with the heaviest work) on my laptop:

demo$ export PYTHONPATH=../classic
demo$ time py test_dijkstra_random.py 100000 5

real    0m4.503s
user    0m4.419s
sys     0m0.059s

demo$ export PYTHONPATH=../closure
demo$ time py test_dijkstra_random.py 100000 5

real    0m3.525s
user    0m3.452s
sys     0m0.058s

The results are consistent: on a graph with a hundred thousand vertices, and about five times as many edges, time went down from about 4.5 to 3.5 seconds when switching to the closure pattern. I expect that this is because access to local variables inside the methods is faster than the corresponding access to object attributes. (I haven't verified this, but it makes sense.) There's no guarantee that the closure pattern always makes the code faster, there might be other examples where it has the opposite effect, but at least this demonstrates that there's the potential of a performance gain.

Of course, you don't have to choose one or the other for everything. I myself intend to use the classic object pattern in parallel with the closure pattern wherever I find it the more suitable.

Thanks

Comments by Jonas Nockert and Peter Lindberg prompted me to edit some details in the text.

Author: Jesper Larsson

Created: 2020-03-06 Fri 16:31